Skip to content

[Canonical LoRA] fix: forward under expert layers#1628

Merged
yaoyu-33 merged 1 commit intoNVIDIA-NeMo:mainfrom
HollowMan6:canonical_lora_experts
Dec 8, 2025
Merged

[Canonical LoRA] fix: forward under expert layers#1628
yaoyu-33 merged 1 commit intoNVIDIA-NeMo:mainfrom
HollowMan6:canonical_lora_experts

Conversation

@HollowMan6
Copy link
Contributor

@HollowMan6 HollowMan6 commented Dec 7, 2025

What does this PR do ?

Fix the following error:

  File "megatron/core/transformer/transformer_layer.py", line 673, in _forward_mlp
    mlp_output_with_bias = self.mlp(pre_mlp_layernorm_output)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 325, in forward
    outputs = custom_forward(hidden_states)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 309, in custom_forward
    output, mlp_bias = self.routed_experts_compute(dispatched_input, probs, residual)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 254, in routed_experts_compute
    expert_output, mlp_bias = self.experts(dispatched_input, tokens_per_expert, permuted_probs)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/experts.py", line 927, in forward
    fc1_output, bias_parallel = self.linear_fc1(
                                ^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LoRALinearSplitFC1UpGate.forward() takes 2 positional arguments but 3 were given
  File "megatron/core/transformer/moe/moe_layer.py", line 325, in forward
    outputs = custom_forward(hidden_states)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 309, in custom_forward
    output, mlp_bias = self.routed_experts_compute(dispatched_input, probs, residual)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 254, in routed_experts_compute
    expert_output, mlp_bias = self.experts(dispatched_input, tokens_per_expert, permuted_probs)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/experts.py", line 927, in forward
    fc1_output, bias_parallel = self.linear_fc1(
                                ^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/Megatron-Bridge/src/megatron/bridge/peft/canonical_lora.py", line 105, in forward
    adapter_output = torch.cat([adapter_output_gate, adapter_output_up], dim=2)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

Changelog

  • Forward accepts and pass arbitrary number of args to base_linear_forward
  • adapter_output uses the last dim for cat

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

✨ Presented to you with Mind Lab - A Lab for Experiential Intelligence.

Fix the following error:

```log
  File "megatron/core/transformer/transformer_layer.py", line 673, in _forward_mlp
    mlp_output_with_bias = self.mlp(pre_mlp_layernorm_output)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 325, in forward
    outputs = custom_forward(hidden_states)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 309, in custom_forward
    output, mlp_bias = self.routed_experts_compute(dispatched_input, probs, residual)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 254, in routed_experts_compute
    expert_output, mlp_bias = self.experts(dispatched_input, tokens_per_expert, permuted_probs)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/experts.py", line 927, in forward
    fc1_output, bias_parallel = self.linear_fc1(
                                ^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LoRALinearSplitFC1UpGate.forward() takes 2 positional arguments but 3 were given
```

```log
  File "megatron/core/transformer/moe/moe_layer.py", line 325, in forward
    outputs = custom_forward(hidden_states)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 309, in custom_forward
    output, mlp_bias = self.routed_experts_compute(dispatched_input, probs, residual)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/moe_layer.py", line 254, in routed_experts_compute
    expert_output, mlp_bias = self.experts(dispatched_input, tokens_per_expert, permuted_probs)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "megatron/core/transformer/moe/experts.py", line 927, in forward
    fc1_output, bias_parallel = self.linear_fc1(
                                ^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/Megatron-Bridge/src/megatron/bridge/peft/canonical_lora.py", line 105, in forward
    adapter_output = torch.cat([adapter_output_gate, adapter_output_up], dim=2)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)
```

Signed-off-by: Hollow Man <hollowman@opensuse.org>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 7, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yaoyu-33
Copy link
Contributor

yaoyu-33 commented Dec 7, 2025

/ok to test d839f2e

@yaoyu-33 yaoyu-33 enabled auto-merge (squash) December 7, 2025 19:15
@yaoyu-33 yaoyu-33 merged commit b2f62a7 into NVIDIA-NeMo:main Dec 8, 2025
47 checks passed
matthew-frank pushed a commit to matthew-frank/Megatron-Bridge that referenced this pull request Dec 8, 2025
Signed-off-by: Hollow Man <hollowman@opensuse.org>
@HollowMan6 HollowMan6 deleted the canonical_lora_experts branch December 8, 2025 21:32
wuxibin89 pushed a commit to verl-project/verl that referenced this pull request Dec 16, 2025
### What does this PR do?

Now that several important fixes have been merged into Megatron-Bridge,
it's better to update the instructions so that everything can really
work correctly.

Related to:
- NVIDIA-NeMo/Megatron-Bridge#1564
- NVIDIA-NeMo/Megatron-Bridge#1603
- NVIDIA-NeMo/Megatron-Bridge#1627
- NVIDIA-NeMo/Megatron-Bridge#1628

Fix: #4303

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

<sub>✨ Presented to you with <a href="https://macaron.im/mindlab">Mind
Lab</a> - A Lab for Experiential Intelligence.</sub>

Signed-off-by: Hollow Man <hollowman@opensuse.org>
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…roject#4533)

### What does this PR do?

Now that several important fixes have been merged into Megatron-Bridge,
it's better to update the instructions so that everything can really
work correctly.

Related to:
- NVIDIA-NeMo/Megatron-Bridge#1564
- NVIDIA-NeMo/Megatron-Bridge#1603
- NVIDIA-NeMo/Megatron-Bridge#1627
- NVIDIA-NeMo/Megatron-Bridge#1628

Fix: verl-project#4303

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

<sub>✨ Presented to you with <a href="https://macaron.im/mindlab">Mind
Lab</a> - A Lab for Experiential Intelligence.</sub>

Signed-off-by: Hollow Man <hollowman@opensuse.org>
sophiayyya pushed a commit to sophiayyya/verl that referenced this pull request Jan 25, 2026
…roject#4533)

### What does this PR do?

Now that several important fixes have been merged into Megatron-Bridge,
it's better to update the instructions so that everything can really
work correctly.

Related to:
- NVIDIA-NeMo/Megatron-Bridge#1564
- NVIDIA-NeMo/Megatron-Bridge#1603
- NVIDIA-NeMo/Megatron-Bridge#1627
- NVIDIA-NeMo/Megatron-Bridge#1628

Fix: verl-project#4303

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

<sub>✨ Presented to you with <a href="https://macaron.im/mindlab">Mind
Lab</a> - A Lab for Experiential Intelligence.</sub>

Signed-off-by: Hollow Man <hollowman@opensuse.org>
y-a23 pushed a commit to y-a23/query that referenced this pull request Feb 5, 2026
### What does this PR do?

Now that several important fixes have been merged into Megatron-Bridge,
it's better to update the instructions so that everything can really
work correctly.

Related to:
- NVIDIA-NeMo/Megatron-Bridge#1564
- NVIDIA-NeMo/Megatron-Bridge#1603
- NVIDIA-NeMo/Megatron-Bridge#1627
- NVIDIA-NeMo/Megatron-Bridge#1628

Fix: verl-project/verl#4303

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

<sub>✨ Presented to you with <a href="https://macaron.im/mindlab">Mind
Lab</a> - A Lab for Experiential Intelligence.</sub>

Signed-off-by: Hollow Man <hollowman@opensuse.org>
KimperYang pushed a commit to KimperYang/TauVerl that referenced this pull request Mar 3, 2026
### What does this PR do?

Now that several important fixes have been merged into Megatron-Bridge,
it's better to update the instructions so that everything can really
work correctly.

Related to:
- NVIDIA-NeMo/Megatron-Bridge#1564
- NVIDIA-NeMo/Megatron-Bridge#1603
- NVIDIA-NeMo/Megatron-Bridge#1627
- NVIDIA-NeMo/Megatron-Bridge#1628

Fix: verl-project/verl#4303

### Checklist Before Starting

- [X] Search for similar PRs. Paste at least one query link here: ...
- [X] Format the PR title as `[{modules}] {type}: {description}` (This
will be checked by the CI)
- `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`,
`trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`,
`ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`,
`env`, `tool`, `ckpt`, `doc`, `data`
- If this PR involves multiple modules, separate them with `,` like
`[megatron, fsdp, doc]`
  - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test`
- If this PR breaks any API (CLI arguments, config, function signature,
etc.), add `[BREAKING]` to the beginning of the title.
  - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching`

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluation results, etc.

### API and Usage Example

> Demonstrate how the API changes if any, and provide usage example(s)
if possible.

```python
# Add code snippet or script demonstrating how to use this
```

### Design & Code Changes

> Demonstrate the high-level design if this PR is complex, and list the
specific changes.

### Checklist Before Submitting

> [!IMPORTANT]
> Please check all the following items before requesting a review,
otherwise the reviewer might deprioritize this PR for review.

- [X] Read the [Contribute
Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md).
- [X] Apply [pre-commit
checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting):
`pre-commit install && pre-commit run --all-files --show-diff-on-failure
--color=always`
- [X] Add / Update [the
documentation](https://github.com/volcengine/verl/tree/main/docs).
- [X] Add unit or end-to-end test(s) to [the CI
workflow](https://github.com/volcengine/verl/tree/main/.github/workflows)
to cover all the code. If not feasible, explain why: ...
- [X] Once your PR is ready for CI, send a message in [the `ci-request`
channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the
`verl` Slack
workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).
(If not accessible, please try [the Feishu group
(飞书群)](https://applink.larkoffice.com/client/chat/chatter/add_by_link?link_token=772jd4f1-cd91-441e-a820-498c6614126a).)

<sub>✨ Presented to you with <a href="https://macaron.im/mindlab">Mind
Lab</a> - A Lab for Experiential Intelligence.</sub>

Signed-off-by: Hollow Man <hollowman@opensuse.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants